Colors of Vancouver

Foreword

This notebook will be showing the exploratory data analysis on the vancouver tree dataset, obatined from here. The data is a subset of the original data from the City of Vancouver website were generated randomly so they may or may not be representative samples of the original data set. So, the data analysis done here may not give us the full picture of what we are trying to find out. The data were obtained from The city of Vancouver's Open Data Portal and follows an Open Government Licence – Vancouver.

Introduction

Questions of Interest

The tagline for the province of British Columbia is "Beautiful British Columbia". The west coast of British Columbia, including Vancouver, has a moderate climate year-round, Which makes it a very good tourist destination year around. The Spring and Fall are especially colorful in Vancouver. The following questions are of interest for the EDA,

1. Find out the Neighbourhoods where the Fall Foliage and Spring Blooms mostly observed
2. Find out the streets in which the Fall Foliage and Spring Bloom are dominant is in the same neighbourhood from question 1.
3. Confirm for a street if the blocks in which they are located are close to each other

Wrangling the Data

Here the vancouver_trees.csv dataset is read and stored in an object named tree_data_all. The date_planned column is changed to datetime dtype using the parse_dates method. Since we are interested in finding the trees on the streets of the city, we would be filtering the dataset for trees on the curb alone.

maximum height range of the tree is 10 and the mean height range is 2.7. the max diameter of 317 we got may be an outlier, so we should drop that value as it can affect the overall calculations.

Now we will filter the dataset for the required columns alone as shown

To answer the first question of interest the dataset needed to be filtered for the genuses of fall and spring. The genuses that lose leafs in autumn and that bloom in the spring are found out manually.
List of genuses for spring are stored in a list named list_flowering. Similarly, list of genuses foe fall are stored in a list named list_decidous

Next step is filtering the dataframe to get dataframes for fall and spring

The information for the fall and spring dataset is obtained as follows

From the information it can be seen that the columns of interest to us does not contain any null values. Hence after using 'groupby' the grouped count can be found out using the size method.

Exploratory Visualisations

Question 1: The neighbourhoods where the Fall Foliage and Spring Blooms mostly observed

By visualising the data in a map would be the best way to find out which neighbourhood best fall and spring colors. To obtain the map the dataset is grouped based on the neighbourhood data and the genus and coordinate values are aggregated as follows.

The maps are obtained for the spring and fall along with point plot giving neighbourhood and genus count.

The following code gives the top 5 neighbourhood names.

From the plots it can be seen the top 5 neighbourhoods with most fall genuses are 'Kensington-Cedar Cottage', 'Renfrew-Collingwood', 'Hastings-Sunrise', 'Victoria-Fraserview', 'Dunbar-Southlands', 'Sunset', and similarly for spring the genuses are 'Renfrew-Collingwood', 'Hastings-Sunrise', 'Kensington-Cedar Cottage', 'Victoria-Fraserview', 'Sunset', 'Dunbar-Southlands'. If someone is visiting vancouver, I would recommend to visit these neighbourhood get the most of the vancouver colors!

Finding the most common genuses that cause the foliage and bloom is our next task. For that following bar chart is obtained between genus name and genus count, and the chart is sorted for the most common genuses is as shown below. Here a dropdown selection option is provided to get the most common genuses in each neighbourhood as well.

for further analysis the most common genuses for fall and spring will be looked upon to find the streets with most colours. As we can see from the plot above, there is a considerably big difference in the counts of these genuses to the rest of them.

Question 2: Filter the dataset for the common genuses for fall and spring and finding the streets for the best colors

For doing further analysis, it is needed to filter the spring and fall dataframes based on the common_genus_spring and common_genus_fall lists as follows..

By grouping the data by neighbourhood name, genus name and on_street columns we get the streets in which we have the most color.

the heatmap for common genuses are obtained as follows

So, In general going to these streets on the plots would give one the most of fall and spring colors. according to the data the top 5 streets for the fall colors are W 6TH AV, W 11TH AV, W 15TH AV, KINGSWAY and ANGUS DRIVE. And similarly from the given dataset the top 5 streets were spring bloom observed are W 59TH AV, DUMFRIES ST, W 22ND AV, RUPERT ST and DUMFRIES ST.

Question 3: Finding out the block location of the Trees

The genus count on street doesn't give the full picture, since they may not be even on the same block. So, grouping the data to include the block information would help us to get a better idea of the exact location where we would be able to find the trees.

From the above data it can be seen that the top 5 spots to see the fall colors would be in 7700th butler st, 100th athletes way, 1400th E 20th Av, 7700th Sparbrook crescent and 3500 W 30th Av. Similarly, the spring bloom can be observed on the 7700th butler st, 1400th E 20TH Av, 7700th SPARBROOK CRESCENT, 2300th HARRISON DRIVE and 4400 W 10TH Av.

Combining the plots for Analysis

The following plot would use click selection to be used for the individual genus distribution for all the neighbourhoods in fall and spring

From the above plot the most number of genuses per street distribution is observed. Tre dropdown menu is giving us the provision to check if the street belongs to the neighbourhood we are interested. A tooltip option is provided to help us keep track of the genus count.

Now let's explore the exact location where we could see the group of trees to get the best view. The filtered dataframe for both spring and fall are plotted as shown.Eventhough the street block is given as an int value, we need to specify it as ordinal data, as it is the street block number, not a continuos value.

a tooltip is provided to find out the exact location where the desired colors are found.

Concluding Remarks

The EDA was helpful in exploring the data and it helped in obtaining different visualizations. It helped me filter out which all plots are useful to answer the questions I have put forward. To answer question 1, I realized the  map along with the point plot would be helpful. to answer question 2, different plots were tried out and the mark_circle plot was chosen to be included in the final_report. To answer the third question, the mark_circle plot is done. Since finding the genus location is our aim, the street name is plotted against block number, giving tooltip to each point, and widgets given in this plot helps in selection of genus name and genus count can be filtered using the slider.